Practical Collapsed Stochastic Variational Inference for the HDP

نویسنده

  • Arnim Bleier
چکیده

Recent advances have made it feasible to apply the stochastic variational paradigm to a collapsed representation of latent Dirichlet allocation (LDA). While the stochastic variational paradigm has successfully been applied to an uncollapsed representation of the hierarchical Dirichlet process (HDP), no attempts to apply this type of inference in a collapsed setting of non-parametric topic modeling have been put forward so far. In this paper we explore such a collapsed stochastic variational Bayes inference for the HDP. The proposed online algorithm is easy to implement and accounts for the inference of hyper-parameters. First experiments show a promising improvement in predictive performance. 1 Background We begin by considering a model where each document d is a mixture θd of K discrete topicdistributions φk over a vocabulary of V terms. Let zdi ∈ {1, ..,K} denote the topic of the i word wdi ∈ {1, .., V } in document d ∈ {1, .., D} and place Dirichlet priors on the parameters θd, φk. We have zdi | θd ∼ Discrete(θd) , θd ∼ Dirichlet(απ) , wdi | zdi, {φk} ∼ Discrete(φzdi) , φk ∼ Dirichlet(β) , where π is the top-level distribution over topics, and α and β are concentration parameters. While the dimensionality of K is fixed in latent Dirichlet allocation (LDA), we want the model to determine the number of topics needed. Consequently we follow the assumptions made by the hierarchical Dirichlet process (HDP) [1] of a countable but infinite number of topics, of which only a finite number is used in the posterior. Our prior π is constructed by a truncated sick-breaking process [2],

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Variational Inference for the HDP-HMM

We derive a variational inference algorithm for the HDP-HMM based on the two-level stick breaking construction. This construction has previously been applied to the hierarchical Dirichlet processes (HDP) for mixed membership models, allowing for efficient handling of the coupled weight parameters. However, the same algorithm is not directly applicable to HDP-based infinite hidden Markov models ...

متن کامل

Stochastic Variational Inference for HMMs, HSMMs, and Nonparametric Extensions

Hierarchical Bayesian time series models can be applied to complex data in many domains, including data arising from behavior and motion [32, 33], home energy consumption [60], physiological signals [69], single-molecule biophysics [71], brain-machine interfaces [54], and natural language and text [44, 70]. However, for many of these applications there are very large and growing datasets, and s...

متن کامل

Collapsed Variational Bayesian Inference for PCFGs

This paper presents a collapsed variational Bayesian inference algorithm for PCFGs that has the advantages of two dominant Bayesian training algorithms for PCFGs, namely variational Bayesian inference and Markov chain Monte Carlo. In three kinds of experiments, we illustrate that our algorithm achieves close performance to the Hastings sampling algorithm while using an order of magnitude less t...

متن کامل

Collapsed Variational Inference for HDP

A wide variety of Dirichlet-multinomial ‘topic’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of is...

متن کامل

The Discrete Infinite Logistic Normal Distribution

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN generalizes the hierarchical Dirichlet process (HDP) to model correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables and study its statistical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1312.0412  شماره 

صفحات  -

تاریخ انتشار 2013